Canonical Correlation, an Approximation, and the Prediction of Protein Abundance
نویسندگان
چکیده
This paper addresses a central problem of Bioinformatics and Proteomics: estimating the amounts of each of the thousands of proteins in a cell culture or tissue sample. Although laboratory methods involving isotopes have been developed for this problem, we seek a simpler method, one that uses fewer laboratory procedures. Specifically, our aim is to use data-mining methods to infer protein levels from the relatively cheap and abundant data available from high-throughput tandem mass spectrometry (MS/MS). In this paper, we develop and evaluate a method for tackling this problem. The method is based on a simple generative model of MS/MS data. We first show how to linearize the model and fit it to data using Canonical Correlation Analysis (CCA). Then, because CCA is computationally expensive for the large datasets we are dealing with, we develop an efficient approximation of CCA, one that exploits the structure of our data. We prove that the method is correct in that it achieves a well-defined optimization criterion. We also evaluate the method on several biological datasets. The datasets themselves were generated by MS/MS experiments performed on various tissue samples taken from Mouse. keywords: Bioinformatics, Proteomics, Data Mining, Machine Learning, Peptides, Tandem Mass Spectrometry.
منابع مشابه
Effects of environmental factors on species diversity of rotifers using biodiversity indicators and canonical correlation analysis (CCA)
Rotifers are microscopic aquatic animals of phylum Rotifera, live in a diverse range of aquatic habitats. They are important in ecology of freshwater ecosystems by recycling nutrients and can alter trophic dynamic of planktonic communities. These features have also been used to infer environmental conditions in an aquatic habitat. Because of the important roles of this group of animals in the t...
متن کاملبررسی صفات کیفی دانه، زیرواحدهای گلوتنین و روابط آنها در گندم دوروم
To study grain quality traits and their relationships with high molecular weight (HMW) and low moleculor weight (LMW) glutenin subunits, 104 durum wheat genotypes were used. Six grain quality characteristics comprising wet and dry gluten content, test weight, grain hardiness, protein content and SDS sedimentation volume were studied. HMW and LMW glutenin subunits were evaluated using SDS-polyac...
متن کاملتحلیل روابط بین درصد و موقعیت شیب اراضی و اجزای عملکرد جو با استفاده از تکنیک همبستگی کانونی
The present study was conducted in Mollaahmad watershed of Ardabil city in 1390 to evaluate the canonical correlation between tillage erosion [slope gradient and position], and grain yield and barley yield components (Sahand cultivar) in order to determine the covariability between the two sets of variables to (1) detect simultaneously occurring patterns in the interdependencies sets of canonic...
متن کاملبررسی صفات کیفی دانه، زیرواحدهای گلوتنین و روابط آنها در گندم دوروم
To study grain quality traits and their relationships with high molecular weight (HMW) and low moleculor weight (LMW) glutenin subunits, 104 durum wheat genotypes were used. Six grain quality characteristics comprising wet and dry gluten content, test weight, grain hardiness, protein content and SDS sedimentation volume were studied. HMW and LMW glutenin subunits were evaluated using SDS-polyac...
متن کاملCanonical Analysis of the Relationship between Components of Professional Ethics and Dimensions of Social Responsibility
Background: Today, professional ethics and social responsibility play an important role in organizations. This study aimed canonical analysis of the relationship between components of professional ethics and social responsibility dimensions among the first high school teachers in the Naghadeh province. Method: This study, in terms of purpose is application, and in terms of data collec...
متن کامل